Practice Code: Day 4

Author

Constantin Manuel Bosancianu

Published

August 1, 2019

1 Introduction

We start as yesterday, by loading a few needed packages, and by defining a few custom functions we use below.

Code
library(pacman)
p_load(tidyverse, ggeffects, texreg, arm, broom.mixed,
       ggthemes, interplot, HLMdiag, DHARMa, dotwhisker,
       knitr, kableExtra, magrittr)

We also define here a few helpful functions we will rely on in this code file.

Code
# Function from Zoltan Fazekas - it's a user-defined function, so it has to be
# re-loaded with exery new R session.
fun_two_SD <- function(x) {
        (x - mean(x, na.rm = TRUE)) / (2 * sd(x, na.rm = TRUE))
}

2 Data input and preparation

We’ll use the same data set as in the previous days. We’re keeping only the variables we’re interested in - compared to the previus days, I will add an additional level-2 predictor.

Code
df_issp <- readRDS("../02-data/01-ISSP.rds")

df_issp %<>%
    dplyr::select(cnt, year, country, poleff, female,
                  age10, educ, urban, incquart, ti_cpi,
                  gini10) %>%
    na.omit()

df_issp %<>%
    mutate(inc1 = case_when(incquart==1 ~ 1,
                            incquart %in% c(2,3,4) ~ 0),
           inc2 = case_when(incquart==2 ~ 1,
                            incquart %in% c(1,3,4) ~ 0),
           inc3 = case_when(incquart==3 ~ 1,
                            incquart %in% c(1,2,4) ~ 0),
           inc4 = case_when(incquart==4 ~ 1,
                            incquart %in% c(1,2,3) ~ 0))

2.1 Centering

Let’s go again through the centering procedures we went through yesterday, generate the last 3 models we looked at, and display the comparison table for the models.

Code
df_issp %<>%
    group_by(cnt) %>%
    mutate(age10CWC = fun_two_SD(age10),
           educCWC = fun_two_SD(educ),
           femCWC = fun_two_SD(female),
           urbanCWC = fun_two_SD(urban),
           inc1CWC = fun_two_SD(inc1),
           inc2CWC = fun_two_SD(inc2),
           inc3CWC = fun_two_SD(inc3),
           inc4CWC = fun_two_SD(inc4))
Code
df_agg <- df_issp %>%
    group_by(cnt) %>%
    summarise(ti_cpi = mean(ti_cpi, na.rm = TRUE),
              gini10 = mean(gini10, na.rm = TRUE)) %>%
    mutate(cpiCGM = fun_two_SD(ti_cpi),
           gini10CGM = fun_two_SD(gini10)) %>%
    dplyr::select(-ti_cpi, -gini10)
Code
df_issp <- left_join(df_issp, df_agg, by = c("cnt"))
rm(df_agg)

2.2 Initial specifications

We estimate 4 specifications, in increasing level of complexity, starting with the null model.

Code
mlm.0 <- lmer(poleff ~ 1 + (1 | cnt),
              data = df_issp,
              control = lmerControl(optimizer = "bobyqa"))
mlm.1 <- lmer(poleff ~ 1 + age10CWC + femCWC + educCWC +
                  urbanCWC + inc2CWC + inc3CWC + inc4CWC + cpiCGM +
                  (1 | cnt),
              data = df_issp,
              control = lmerControl(optimizer = "bobyqa"))
mlm.2 <- lmer(poleff ~ 1 + age10CWC + femCWC + educCWC +
                  urbanCWC + inc2CWC + inc3CWC + inc4CWC + cpiCGM +
                  (1 + educCWC | cnt),
              data = df_issp,
              control = lmerControl(optimizer = "bobyqa"))
mlm.3 <- lmer(poleff ~ 1 + age10CWC + femCWC + educCWC +
                  urbanCWC + inc2CWC + inc3CWC + inc4CWC + cpiCGM +
                  educCWC * cpiCGM + (1 + educCWC | cnt),
              data = df_issp,
              control = lmerControl(optimizer = "bobyqa"))

3 Model fit

As you could see in the previous days, the summary() function doesn’t give you any indication about the model fit. A set of dedicated functions exist for this.

Code
logLik(mlm.1)
'log Lik.' -39462.9 (df=11)
Code
logLik(mlm.2)
'log Lik.' -39362.13 (df=13)
Code
logLik(mlm.3)
'log Lik.' -39360.93 (df=14)

You also have functions for AIC and BIC (there’s not much sense to have a function for deviance, since it’s computed as \(-2 * logLik\)).

Code
AIC(mlm.1)
[1] 78947.8
Code
AIC(mlm.2)
[1] 78750.27
Code
AIC(mlm.3)
[1] 78749.86
Code
BIC(mlm.1)
[1] 79041.69
Code
BIC(mlm.2)
[1] 78861.23
Code
BIC(mlm.3)
[1] 78869.36

For a test of whether the differences in fit are statistically significant, we can turn to the anova() function.

Code
anova(mlm.1, mlm.2, mlm.3)
Data: df_issp
Models:
mlm.1: poleff ~ 1 + age10CWC + femCWC + educCWC + urbanCWC + inc2CWC + inc3CWC + inc4CWC + cpiCGM + (1 | cnt)
mlm.2: poleff ~ 1 + age10CWC + femCWC + educCWC + urbanCWC + inc2CWC + inc3CWC + inc4CWC + cpiCGM + (1 + educCWC | cnt)
mlm.3: poleff ~ 1 + age10CWC + femCWC + educCWC + urbanCWC + inc2CWC + inc3CWC + inc4CWC + cpiCGM + educCWC * cpiCGM + (1 + educCWC | cnt)
      npar   AIC   BIC logLik deviance    Chisq Df Pr(>Chisq)    
mlm.1   11 78885 78979 -39431    78863                           
mlm.2   13 78689 78800 -39332    78663 199.4736  2  < 2.2e-16 ***
mlm.3   14 78684 78804 -39328    78656   7.2965  1   0.006909 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The function automatically re-fits every model in the comparison with FIML as opposed to REML, and produces a comparison table for the 3 models.

Quick questions for clarification:

  1. Why is \(DF=2\) for the comparison between Model 1 and Model 2? What are the extra 2 parameters that are estimated in Model 2?
  2. Which one is the better fitting model out of the 3?

As an additional specification, let’s also try a model that includes Gini as predictor for the intercept at L1.

Code
mlm.4 <- lmer(poleff ~ 1 + age10CWC + femCWC + educCWC +
                  urbanCWC + inc2CWC + inc3CWC + inc4CWC + cpiCGM +
                  gini10CGM + educCWC * cpiCGM + (1 + educCWC | cnt),
              data = df_issp,
              control = lmerControl(optimizer = "bobyqa"))
summary(mlm.4)
Linear mixed model fit by REML ['lmerMod']
Formula: poleff ~ 1 + age10CWC + femCWC + educCWC + urbanCWC + inc2CWC +  
    inc3CWC + inc4CWC + cpiCGM + gini10CGM + educCWC * cpiCGM +  
    (1 + educCWC | cnt)
   Data: df_issp
Control: lmerControl(optimizer = "bobyqa")

REML criterion at convergence: 78724.3

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.3103 -0.6728 -0.0112  0.6700  8.7720 

Random effects:
 Groups   Name        Variance Std.Dev. Corr 
 cnt      (Intercept) 0.07432  0.2726        
          educCWC     0.01094  0.1046   -0.29
 Residual             0.47063  0.6860        
Number of obs: 37628, groups:  cnt, 35

Fixed effects:
                Estimate Std. Error t value
(Intercept)     2.963145   0.046234  64.089
age10CWC        0.121616   0.007487  16.244
femCWC         -0.121016   0.007118 -17.001
educCWC         0.322669   0.019467  16.575
urbanCWC        0.044423   0.007207   6.164
inc2CWC         0.057079   0.008774   6.505
inc3CWC         0.113761   0.008949  12.712
inc4CWC         0.218900   0.009301  23.534
cpiCGM          0.267760   0.106317   2.519
gini10CGM      -0.063952   0.103126  -0.620
educCWC:cpiCGM  0.108093   0.038925   2.777

Correlation of Fixed Effects:
            (Intr) a10CWC femCWC edcCWC urbCWC in2CWC in3CWC in4CWC cpiCGM
age10CWC     0.000                                                        
femCWC       0.000  0.024                                                 
educCWC     -0.265  0.096 -0.003                                          
urbanCWC     0.000 -0.007 -0.013 -0.057                                   
inc2CWC      0.000  0.025  0.035 -0.039  0.005                            
inc3CWC      0.000  0.077  0.057 -0.071  0.002  0.515                     
inc4CWC      0.000  0.078  0.079 -0.115 -0.036  0.510  0.540              
cpiCGM       0.000  0.000  0.001 -0.001  0.000  0.001  0.001  0.000       
gini10CGM    0.000  0.000  0.001 -0.003  0.000  0.003  0.001  0.000  0.470
edcCWC:cCGM  0.000 -0.009 -0.005  0.001  0.001  0.000 -0.003 -0.007 -0.234
            g10CGM
age10CWC          
femCWC            
educCWC           
urbanCWC          
inc2CWC           
inc3CWC           
inc4CWC           
cpiCGM            
gini10CGM         
edcCWC:cCGM  0.006

Is there maybe a differential effect of Gini on education groups, though?

Code
mlm.5 <- lmer(poleff ~ 1 + age10CWC + femCWC + educCWC +
                  urbanCWC + inc2CWC + inc3CWC + inc4CWC + cpiCGM +
                  gini10CGM + educCWC * cpiCGM + educCWC * gini10CGM +
                (1 + educCWC | cnt),
              data = df_issp,
              control = lmerControl(optimizer = "bobyqa"))
summary(mlm.5)
Linear mixed model fit by REML ['lmerMod']
Formula: poleff ~ 1 + age10CWC + femCWC + educCWC + urbanCWC + inc2CWC +  
    inc3CWC + inc4CWC + cpiCGM + gini10CGM + educCWC * cpiCGM +  
    educCWC * gini10CGM + (1 + educCWC | cnt)
   Data: df_issp
Control: lmerControl(optimizer = "bobyqa")

REML criterion at convergence: 78719.7

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.3072 -0.6730 -0.0110  0.6703  8.7698 

Random effects:
 Groups   Name        Variance Std.Dev. Corr 
 cnt      (Intercept) 0.073097 0.27036       
          educCWC     0.008184 0.09046  -0.27
 Residual             0.470626 0.68602       
Number of obs: 37628, groups:  cnt, 35

Fixed effects:
                   Estimate Std. Error t value
(Intercept)        2.963145   0.045854  64.621
age10CWC           0.121602   0.007484  16.248
femCWC            -0.121114   0.007117 -17.017
educCWC            0.323276   0.017324  18.661
urbanCWC           0.044417   0.007206   6.164
inc2CWC            0.056884   0.008774   6.483
inc3CWC            0.113726   0.008949  12.709
inc4CWC            0.219089   0.009300  23.558
cpiCGM             0.307930   0.106368   2.895
gini10CGM          0.019012   0.106285   0.179
educCWC:cpiCGM     0.046184   0.039708   1.163
educCWC:gini10CGM -0.124201   0.038787  -3.202

Correlation of Fixed Effects:
            (Intr) a10CWC femCWC edcCWC urbCWC in2CWC in3CWC in4CWC cpiCGM
age10CWC     0.000                                                        
femCWC       0.000  0.024                                                 
educCWC     -0.234  0.108 -0.003                                          
urbanCWC     0.000 -0.007 -0.013 -0.064                                   
inc2CWC      0.000  0.026  0.035 -0.044  0.005                            
inc3CWC      0.000  0.077  0.057 -0.080  0.002  0.515                     
inc4CWC      0.000  0.079  0.079 -0.129 -0.036  0.510  0.540              
cpiCGM       0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000       
gini10CGM    0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.485
edcCWC:cCGM  0.000 -0.010 -0.003 -0.005  0.001  0.005  0.000 -0.006 -0.236
edCWC:10CGM  0.000 -0.001  0.005 -0.012  0.000  0.011  0.004  0.000 -0.117
            g10CGM eCWC:C
age10CWC                 
femCWC                   
educCWC                  
urbanCWC                 
inc2CWC                  
inc3CWC                  
inc4CWC                  
cpiCGM                   
gini10CGM                
edcCWC:cCGM -0.115       
edCWC:10CGM -0.242  0.495

Which one fits the data better?

Code
anova(mlm.3, mlm.4, mlm.5)
Data: df_issp
Models:
mlm.3: poleff ~ 1 + age10CWC + femCWC + educCWC + urbanCWC + inc2CWC + inc3CWC + inc4CWC + cpiCGM + educCWC * cpiCGM + (1 + educCWC | cnt)
mlm.4: poleff ~ 1 + age10CWC + femCWC + educCWC + urbanCWC + inc2CWC + inc3CWC + inc4CWC + cpiCGM + gini10CGM + educCWC * cpiCGM + (1 + educCWC | cnt)
mlm.5: poleff ~ 1 + age10CWC + femCWC + educCWC + urbanCWC + inc2CWC + inc3CWC + inc4CWC + cpiCGM + gini10CGM + educCWC * cpiCGM + educCWC * gini10CGM + (1 + educCWC | cnt)
      npar   AIC   BIC logLik deviance  Chisq Df Pr(>Chisq)   
mlm.3   14 78684 78804 -39328    78656                        
mlm.4   15 78686 78814 -39328    78656 0.3115  1   0.576750   
mlm.5   16 78678 78815 -39323    78646 9.7776  1   0.001767 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

4 Diagnostic checks

Let’s take Model 5, which seemed to be the most promising of all the specifications tested so far, at least if we judge based on the indices of model fit discussed earlier today and the LRT I conducted above.

Start from examining Level-1 residuals. A custom function for computing the within-group residuals is hlm_resid().

Code
df_resid <- hlm_resid(mlm.5,
                      level = 1)

4.1 Linearity

Look at a plot of residuals vs. the outcome…

Code
ggplot(data = df_resid,
       aes(x = poleff,
           y = .resid)) +
    geom_point() +
    theme_clean() +
  labs(x = "Political efficacy",
       y = "Residual")

It’s normal that it’s tilted; getting rid of the tilt requires you to plot residuals against fitted values. Try it and see what comes out.

… as well as residuals against the predictors at L1

Code
ggplot(data = df_resid,
       aes(x = age10CWC,
           y = .resid)) +
    geom_point(alpha = 0.05) +
    theme_clean() +
  labs(x = "Age (centered, in decades)",
       y = "Residual")

Code
ggplot(data = df_resid,
       aes(x = educCWC,
           y = .resid)) +
    geom_point() +
    theme_clean() +
  labs(x = "Education (centered)",
       y = "Residual")

With dichotomous predictors there is not much chance to diagnose a non-linear trend, so we won’t need to generate these plots.

4.2 Homogeneity of variance across groups

This is the Bartlett-Kendall test we discussed.

Code
bartlett.test(formula = .resid ~ cnt,
              data = df_resid)

    Bartlett test of homogeneity of variances

data:  .resid by cnt
Bartlett's K-squared = 463.36, df = 34, p-value < 2.2e-16

Unfortunately, we soundly reject \(H_0\), which means that we do NOT have homogeneity of variances of residuals across countries.

Code
df_resid %>%
    group_by(cnt) %>%
    summarise(VAR = var(.resid)) %>%
    ungroup() %>%
    slice(1:15) %>%
    kable(caption = "Country-by-country variance of residuals",
          caption.above = TRUE,
          digits = 2,
          col.names = c("Country", "Residual variance"))
Country-by-country variance of residuals
Country Residual variance
AT 0.51
AU 0.46
BE 0.50
CH 0.38
CL 0.40
CZ 0.52
DE 0.52
DK 0.62
EE 0.38
ES 0.50
FI 0.52
FR 0.34
GB 0.43
GE 0.46
HR 0.42

If this was an analysis intended for publishing there would be little reason to advance further, but for didactic purposes we can proceed.

4.3 Normally-distributed errors

Code
ggplot(df_resid,
       aes(sample = .resid)) +
    stat_qq() +
    stat_qq_line() +
    theme_clean()

4.4 Level 2 residuals

We can do the same types of plots for the Level 2 residuals, only now we obviously only get one residual for each country.

Code
df_resid <- hlm_resid(mlm.5,
                      level = "cnt") %>%
    as.data.frame()
Code
ggplot(df_resid,
       aes(sample = .ranef.intercept)) +
    stat_qq() +
    stat_qq_line() +
    theme_clean()

Code
ggplot(df_resid,
       aes(sample = .ranef.educ_cwc)) +
    stat_qq() +
    stat_qq_line() +
    theme_clean()

Keep in mind, though, that these are not residuals in the proper sense of the word, as we don’t know what the “true” slope is in a country. Instead, each residual is an assumption about what this slope is, which comes with its own uncertainty interval.

Code
ranef(mlm.5)
$cnt
   (Intercept)      educCWC
AT -0.19875454  0.050986649
AU  0.15786534 -0.068155516
BE -0.12543206 -0.050565005
CH  0.21722101  0.018129943
CL -0.02668506  0.097934989
CZ -0.19839797  0.079336498
DE  0.05371625  0.057589852
DK  0.09637863 -0.016330974
EE -0.19037225  0.004434058
ES -0.11269092  0.163057678
FI -0.09856012  0.091370141
FR  0.46568328 -0.047214007
GB -0.03299656 -0.008672287
GE -0.12039667  0.026217023
HR -0.26846920 -0.020141842
IL -0.23871591  0.051822863
IN -0.03528847 -0.160744238
IS  0.42862890 -0.115941606
JP  0.04141765 -0.144071675
KR -0.19639951  0.019741834
LT -0.36996615  0.013633247
NL  0.14519744  0.094139612
NO  0.34176765 -0.048707860
NZ  0.09730485 -0.115032508
PH  0.15198693 -0.107164681
PL -0.35974976 -0.072951456
RU  0.10291872  0.036281355
SE  0.11884518 -0.001118316
SI -0.48044712  0.054427528
SK -0.24016026 -0.018662586
TR  0.33326551 -0.123082888
TW -0.04422412  0.047898821
US -0.01650215  0.092610027
VE  0.77736894  0.027902632
ZA -0.17535750  0.091042697

with conditional variances for "cnt" 
Code
se.ranef(mlm.5)
$cnt
   (Intercept)    educCWC
AT  0.02523909 0.04386562
AU  0.02398895 0.04219476
BE  0.01668830 0.03123589
CH  0.02169577 0.03897322
CL  0.02283165 0.04059436
CZ  0.02037335 0.03702294
DE  0.01780911 0.03305011
DK  0.01670809 0.03126832
EE  0.02997563 0.04966262
ES  0.01989647 0.03630310
FI  0.02207438 0.03951912
FR  0.02662697 0.04565077
GB  0.01889037 0.03475583
GE  0.01985476 0.03623972
HR  0.02674815 0.04580318
IL  0.02558785 0.04432109
IN  0.02109257 0.03809200
IS  0.02488781 0.04340218
JP  0.01992162 0.03634128
KR  0.01858292 0.03427530
LT  0.02515410 0.04375393
NL  0.01871383 0.03448035
NO  0.01956995 0.03580518
NZ  0.02891813 0.04843999
PH  0.01990484 0.03631581
PL  0.01574311 0.02967018
RU  0.02458232 0.04299530
SE  0.02451942 0.04291109
SI  0.02844191 0.04787619
SK  0.02176117 0.03906791
TR  0.02081882 0.03768745
TW  0.01879795 0.03461176
US  0.02021352 0.03678266
VE  0.02243743 0.04003739
ZA  0.01459662 0.02772829
Code
df_resid <- data.frame(cnt = rownames(se.ranef(mlm.5)$cnt),
                       int = ranef(mlm.5)$cnt$`(Intercept)`,
                       se.int = se.ranef(mlm.5)$cnt[ ,1],
                       edu = ranef(mlm.5)$cnt$educCWC,
                       se.edu = se.ranef(mlm.5)$cnt[ ,2])

df_resid %>%
    ggplot(aes(x = reorder(cnt, -int),
               y = int)) +
    geom_point(size = 3) +
    labs(x = "Country",
         y = "Residual intercept") +
    theme_clean() +
    geom_errorbar(aes(ymin = int - 1.96 * se.int,
                      ymax = int + 1.96 * se.int),
                  linewidth = 1.25,
                  width = 0) +
    coord_flip()

Venezuela might be an outlier here.

Code
df_resid %>%
    ggplot(aes(x = reorder(cnt, -edu),
               y = edu)) +
    geom_point(size = 3) +
    labs(x = "Country",
         y = "Residual education") +
    theme_clean() +
    geom_errorbar(aes(ymin = edu - 1.96 * se.edu,
                      ymax = edu + 1.96 * se.edu),
                  linewidth = 1.25,
                  width = 0) +
    coord_flip()

Things seem fine with the random effects for education.

We can continue with similar plots of predictors against residuals, to diagnose linearity, as well as visually assess constant variance.

5 Tasks

5.1 In class

Can you quickly extract from the ISSP for each country its value on cpiCGM and gini10CGM? Then please merge these into df_resid as additional columns, and use these for obtaining 2 plots, of predictors on X-axis and residuals on the Y-axis. All of the functions you need to do this have been introduced already.

5.2 At home

Load up again the ISSP data, and run a more complete specification at L1. Add to it religious attendance, as well as marital status, plus a quadratic term for age (\(age^2\)). Does this improve things in terms of residual variance across the countries?

6 Presenting results

At this point, after choosing the best fitting model and making all the diagnostic checks (and adjustments) you need to make, you would report the results from your models. You can present the null model, or leave it for the appendix - your call. You would typically also present 1-2 intermediate specifications (perhaps gradually adding more categories of predictors: attitudinal, institutional etc), and then the final specification. Either in the main body of the paper, or in the appendix, you would also present a few sensitivity checks.

Suppose that we had determined above that Model 5 truly WAS the best that we could do in terms of specification. How to present it?

6.1 Tables of coefficients

The standard approach is to display the table of coefficients directly in the paper. In here, the functions available in the texreg package are invaluable, though other packages that offer similar functionality exist as well.

Code
texreg(list(mlm.0, mlm.2, mlm.4, mlm.5),
       digits = 3,
       custom.model.names = c("Model 1","Model 2","Model 3","Model 4"),
       single.row = FALSE,
       custom.coef.names = c("(Intercept)", "Age (in decades)", "Gender (woman)",
                             "Education", "Urban residence", "Income: 2nd quantile",
                             "Income: 3rd quantile", "Income: 4th quantile",
                             "Corruption perceptions", "Gini (10-point units)",
                             "Education * Corruption perceptions",
                             "Education * Gini"),
       booktabs = TRUE,
       dcolumn = TRUE,
       use.packages = FALSE,
       caption = "Comparison of multilevel specifications for political efficacy",
       file = "../05-output/04-Table-mlm.tex")
Code
htmlreg(list(mlm.0, mlm.2, mlm.4, mlm.5),
        digits = 3,
        custom.model.names = c("Model 1","Model 2","Model 3","Model 4"),
        single.row = FALSE,
        custom.coef.names = c("(Intercept)", "Age (in decades)", "Gender (woman)",
                              "Education", "Urban residence", "Income: 2nd quantile",
                              "Income: 3rd quantile", "Income: 4th quantile",
                              "Corruption perceptions", "Gini (10-point units)",
                              "Education * Corruption perceptions",
                              "Education * Gini"),
        caption = "Comparison of multilevel specifications for political efficacy",
        file = "../05-output/04-Table-mlm.html")

6.2 Coefficient plots

Alternatively, you could choose to display coefficients as dot-and-whisker plots. Though there are canned functions available, the one I am most familiar with has severe limitations.

Code
dwplot(list(mlm.2, mlm.4, mlm.5),
       show_intercept = FALSE) +
    theme_clean()

This is why I show here the fully manual approach to plotting these quantities, even though it’s a bit long.

First step: tidy up model results and create an indicator variable that identifies the model from which they’re obtained.

Code
mod1tid <- tidy(mlm.2,
                effects = "fixed",
                conf.int = TRUE,
                conf.level = 0.95)
mod1tid <- mod1tid %>%
    dplyr::select(-c(effect)) %>%
    mutate(model = "model1")
mod2tid <- tidy(mlm.4,
                effects = "fixed",
                conf.int = TRUE,
                conf.level = 0.95)
mod2tid <- mod2tid %>%
    dplyr::select(-c(effect)) %>%
    mutate(model = "model2")
mod3tid <- tidy(mlm.5,
                effects = "fixed",
                conf.int = TRUE,
                conf.level = 0.95)
mod3tid <- mod3tid %>%
    dplyr::select(-c(effect)) %>%
    mutate(model = "model3")

df_modframe <- rbind(mod1tid, mod2tid, mod3tid)
rm(mod1tid, mod2tid, mod3tid)

Second stage: make variable names nice, and order coefficients based on the order in which they appear in the output of the most complete specification.

Code
df_modframe %<>%
    mutate(term = case_when(term %in% "(Intercept)" ~ "(Intercept)",
                            term %in% "age10CWC" ~ "Age (decades)",
                            term %in% "as.factor(female)1" ~ "Woman",
                            term %in% "educCWC" ~ "Education",
                            term %in% "as.factor(urban)1" ~ "Urban residence",
                            term %in% "as.factor(incquart)2" ~ "2nd income quartile",
                            term %in% "as.factor(incquart)3" ~ "3rd income quartile",
                            term %in% "as.factor(incquart)4" ~ "4th income quartile",
                            term %in% "cpiCGM" ~ "Perception of corruption",
                            term %in% "gini10CGM" ~ "Gini (10-points)",
                            term %in% "educCWC:cpiCGM" ~ "Education * Perceptions",
                            term %in% "educCWC:gini10CGM" ~ "Education * Gini"))

vec_term <- df_modframe$term[df_modframe$model == "model3"]
df_modframe$term <- factor(df_modframe$term,
                           levels = vec_term[length(vec_term):1],
                           ordered = TRUE)
rm(vec_term)

Final stage: plot the coefficients.

Code
df_modframe %>%
    dplyr::select(-c(std.error, statistic)) %>%
    filter(!(term == "(Intercept)")) %>%
    ggplot(aes(x = term, y = estimate, color = model)) +
    geom_point(size = 3,
               position = position_dodge(width = 0.5)) +
    geom_errorbar(aes(ymin = conf.low,
                      ymax = conf.high),
                  width = 0.,
                  linewidth = 1.25,
                  position = position_dodge(width = 0.5)) +
    geom_hline(yintercept = 0, linewidth = 1.25, linetype = "dashed",
               color = rgb(255, 0, 90, maxColorValue = 255)) +
      coord_flip() +
    labs(y = "Estimate",
         x = "Term") +
    theme_clean() +
    theme(axis.text.x = element_text(size = 14),
          axis.text.y = element_text(size = 14)) +
    scale_colour_discrete(name = "Models",
                          breaks = c("model1", "model2", "model3"),
                          labels = c("Model 1", "Model 2", "Model 3"))

Code
rm(df_modframe)

6.3 Marginal effects plot

Finally, if your focus is only on one of the effects, you can choose to present it using the standard marginal effects plot. Here as well it’s best to stick to a manual approach.

Code
pl1 <- ggpredict(mlm.5,
                 terms = "educCWC [-8, -4, 0, 4, 8, 12]",
                 type = "fe",
                 ci.lvl = 0.95)
ggplot(pl1,
       aes(x = x, y = predicted))  +
    geom_point(size = 3) +
    geom_errorbar(aes(ymin = conf.low,
                      ymax = conf.high),
                  width = 0,
                  linewidth = 1.25) +
    labs(y = "Political efficacy",
         x = "Education",
         title = "") +
    theme_clean() +
    scale_x_continuous(breaks = c(-8, -4, 0, 4, 8, 12),
                       labels = c("Very low","Low","Average","Above average",
                                  "High","Very high"))

Code
rm(pl1)

Alternatively, you can treat the variable as properly continuous.

Code
pl2 <- ggpredict(mlm.5,
                 terms = "educCWC",
                 type = "fe",
                 ci.lvl = 0.95)
ggplot(pl2,
       aes(x = x, y = predicted))  +
    geom_line(linewidth = 2) +
    geom_ribbon(aes(ymin = conf.low,
                    ymax = conf.high),
                alpha = 0.5) +
    labs(y = "Political efficacy",
         x = "Education",
         title = "") +
    theme_clean()

7 Package versions

Package versions used in this script.

Code
sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.2

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] magrittr_2.0.3      kableExtra_1.3.4    knitr_1.42         
 [4] dotwhisker_0.7.4    DHARMa_0.4.6        HLMdiag_0.5.0      
 [7] interplot_0.2.3     abind_1.4-5         ggthemes_4.2.4     
[10] broom.mixed_0.2.9.4 arm_1.13-1          lme4_1.1-31        
[13] Matrix_1.5-3        MASS_7.3-58.2       texreg_1.38.6      
[16] ggeffects_1.1.5     forcats_1.0.0       stringr_1.5.0      
[19] dplyr_1.1.0         purrr_1.0.1         readr_2.1.3        
[22] tidyr_1.3.0         tibble_3.1.8        ggplot2_3.4.0      
[25] tidyverse_1.3.2     pacman_0.5.1       

loaded via a namespace (and not attached):
 [1] googledrive_2.0.0   minqa_1.2.5         colorspace_2.1-0   
 [4] ellipsis_0.3.2      sjlabelled_1.2.0    ggstance_0.3.6     
 [7] snakecase_0.11.0    parameters_0.20.2   fs_1.6.0           
[10] rstudioapi_0.14     listenv_0.9.0       furrr_0.3.1        
[13] farver_2.1.1        fansi_1.0.4         lubridate_1.9.1    
[16] xml2_1.3.3          codetools_0.2-18    splines_4.2.1      
[19] jsonlite_1.8.4      nloptr_2.0.3        interactionTest_1.2
[22] broom_1.0.3         dbplyr_2.3.0        compiler_4.2.1     
[25] httr_1.4.4          backports_1.4.1     assertthat_0.2.1   
[28] fastmap_1.1.0       gargle_1.3.0        cli_3.6.0          
[31] htmltools_0.5.4     tools_4.2.1         coda_0.19-4        
[34] gtable_0.3.1        glue_1.6.2          reshape2_1.4.4     
[37] Rcpp_1.0.10         cellranger_1.1.0    vctrs_0.5.2        
[40] svglite_2.1.1       nlme_3.1-161        insight_0.19.0     
[43] xfun_0.36           globals_0.16.2      rvest_1.0.3        
[46] timechange_0.2.0    lifecycle_1.0.3     diagonals_6.4.0    
[49] googlesheets4_1.0.1 future_1.30.0       scales_1.2.1       
[52] hms_1.1.2           parallel_4.2.1      yaml_2.3.7         
[55] stringi_1.7.12      highr_0.10          bayestestR_0.13.0  
[58] boot_1.3-28.1       rlang_1.0.6         pkgconfig_2.0.3    
[61] systemfonts_1.0.4   evaluate_0.20       lattice_0.20-45    
[64] htmlwidgets_1.6.1   labeling_0.4.2      tidyselect_1.2.0   
[67] parallelly_1.34.0   plyr_1.8.8          R6_2.5.1           
[70] generics_0.1.3      DBI_1.1.3           pillar_1.8.1       
[73] haven_2.5.1         withr_2.5.0         mgcv_1.8-41        
[76] datawizard_0.6.5    janitor_2.1.0       modelr_0.1.10      
[79] crayon_1.5.2        utf8_1.2.2          tzdb_0.3.0         
[82] rmarkdown_2.20      grid_4.2.1          readxl_1.4.1       
[85] reprex_2.0.2        digest_0.6.31       webshot_0.5.4      
[88] munsell_0.5.0       viridisLite_0.4.1